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(54) Process for selecting candidate drug compounds 



(57) The invention relates to a process for drug can- 
didate identification, said process comprising the steps 
of: 

(1) obtaining a computerised representation of the 
three-dimensional structure of a binding site on the 
surface of a biological macromolecule; 

(2) generating a computerised model of the func- 
tional structure of said binding site which may be 
used to identify favourable and unfavourable inter- 
actions between the binding site and a drug candi- 
date molecule; 

(3) identifying a molecular fragment (or "template" 
T) capable of placement within said binding site and 
capable of carrying at least one (preferably a plu- 
rality (ie. at least two) and especially preferably at 
least 3) substituent group, said molecular fragment 
either being capable of being synthesized from re- 
agent compounds accessible in substituted form 
whereby to import said substituent groups on syn- 
thesis of said molecular fragment or being present 
in an accessible reagent compound capable of sub- 
stitution with said substituent groups by reaction 
with further accessible reagent compounds; 

(4) generating a set of lists of accessible reagent 
compounds (eg. a^-A. a2-A, ay A, etc, b^-B, b2-B, 
b3-B. etc. c^-C. C2-C. C3-C, etc), the lists being such 
that a combination of compounds tak n from each 
list (eg. a^ -A, bg-B and c^^ -0) may b reacted to pro- 
duce a candidate compound comprising said mo- 



lecular fragment carrying a plurality of substituent 
groups (eg. b3 c^^ T) thereby generating a first vir- 
tual library of candidate compounds being the the- 
oretical set of compounds producible by reaction of 
the members of said lists (ie. a^b^c^T, a^b^c^T, 
^i^a^i ^ ^^1^ member of each list comprising 
a component (eg. A.B.C, etc.) common to the other 
members of that list and a component (eg. a^, b^. 
c^ , etc) unique within that list; 

(5) for each said list limiting the number of members 
thereof using a first set of exclusion rules thereby 
to generate a restricted second virtual library of can- 
didate compounds, the operation of said first set of 
rules involving for each member of each list com- 
puterised comparison for favourable or unfavoura- 
ble interactions between said computerised model 
and a structure comprising said molecular fragment 
and a substituent deriving from the unique compo- 
nent within said list d that member, the molecular 
fragment and the computerized model being held in 
fixed spatial relationship to each other for said com- 
parison; 

(6) evaluating and ranking by computer the mem- 
bers of said second virtual library for favourable and 
unfavourable interactions with said computerised 
model and thereby generating a restricted third vir- 
tual library of candidate compounds ranked as hav- 
ing favourable interactions; 

(7) optior^lty, selecting from said third virtual library 
at least one further molecular fragment and repeat- 
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ing steps (4), (5) and (6) to generate an alternative 
third virtual library; 

(8) screening said third virtual library using a second 
set of exclusion rules thereby to generate a restrict- 
ed fourth virtual library of candidate compounds 
comprising compounds which are candidates for 
synthesis and experimental evaluation for drug ef- 
ficacy; 

(9) synthesizing some or all candidate compourKfs 
of said fourth virtual library to produce a candidate 
compound library; 

(10) experimentally evaluating the compounds of 
said candidate compound library for drug efficacy; 

(11) analysing the experimental efficacy data gen- 
erated in step (10) for structure-activity relationship 
information; 

(12) using the information derived in step (11) se- 
lecting a revised set of lists of accessible reagent 
compounds, said lists being expanded to include 
selected reagents not present in the restricted lists 
generated in step (5) and optionally restricted to ex- 
clude selected reagents present in the restricted 



lists generated in step (5); 

(13) repeating steps (6) and (7) to identify further 
compounds which are candidates for synthesis and 
experimental evaluation for drug efficacy; 

(14) synthesising and experimentally evaluating 
said further compounds for drug efficacy; 

(15) if required repeating steps (11) to (14) one or 
more times; 

(16) identifying as a lead candidate a compound 
synthesized and experimentally evaluated as 
at>ove. 

The process of the invention is characterised by the 
rapid generation of a relatively small set of readily syn- 
thesisable candidate compounds with a high success 
rate in terms of drug efficacy and hence a high predictive 
value for directing subsequent iterations. 
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Description 

FIELD OF THE INVENTION 

s This invention relates to a process for selecting lead candidate drug compounds, and in particular to such a process 

in which synthesis of candidate compounds is simplified and minimized and success rate with synthesized compounds 
is maximized. 

BACKGROUND OF THE INVENTION 

10 

Drug discovery has been a time and resource consuming exercise. Traditionally, key steps in drug discovery have 
included the identification of a compound or set of compounds having the desired drug property, the identification of 
the active structure within such compounds and the identification of a lead candidate, a compound which incorporates 
that structure and connbines adequate activity with acceptable toxicity and synthetic accessibility By acceptable syn- 
thetic accessibility it is meant that the lead compound should be produceable via a synthetic route which is sufficiently 
straightforward and inexpensive that commercial production of the compound is a viable option. 

The identification of active compounds has involved screening of extensive connpound libraries for the desired 
drug property. Recently, the technique known as combinatorial chemistry has offered a moderate cost route to the 
synthesis of very large compound libraries which can be saeened in this manner. Although it is now increasingly being 

^ applied to the synthesis of libraries of non-peptide organic molecules, the combinatorial chemistry technique is espe- 
cially applicable to the production of libraries of peptide and peptotd compounds, and synthesis and testing of such 
compound libraries can even be automated and operated under computer control. Thus for example an alternative 
approach to drug discovery using computer-controlled combinatorial chemistry is described by 3-Dimensional Phar- 
maceuticals Inc. in WO-A-96/08781. 

25 Unfortunately, however, the peptide and peptoid compounds for which such a combinatorial chemistry approach 

is particularly suited, due to the ease with which peptkje molecules can be produced with a multiplicity of sequences 
on automated peptide synthesizers, often display undesirable pharmacokinetics, such as poor bioavailability 

An alternative approach to drug discovery has also devebped over recent years. This approach referred to vari- 
ously as Structure-Based Drug Design (SBDD) or Computer Aided Molecular Design (CAMD) involves structural anal- 

30 ysis of the receptor site for the drug nr>olecule and can involve computerized generatbn of a molecular structure which 
is capable of binding to that site. ie. a structure which has an appropriate structural framework to fit within the receptor 
site and which is so functionalized as to have favourable interactions with selected functional components of the re- 
ceptor site. 

One example of the SBDD system is the PRO_LIGAND system of Proteus Molecular Design Limited. This is 
35 described for example by Ctark et al in a series of papers J. Corrput.-Aided Md. Design 9: 1 3-32 (1 995). 9: 139-148 
(1995). 9: 213-225 (1995) and 9: 381-395 (1995), J. Med Chem 37: 3994-4002 (1994), and J. Chem. Inf. CompuL 
Sci. 35: 914-923(1995). 

While highly effective, SBDD serves to generate and assess molecular structures on the basis of predicted activity 
without particular regard to synthetic accessibility These molecules must then be made and tested and subsequent 
40 optimization to produce a lead candkiate may require time consuming, complicated or expensive chemical syntheses. 

It has now been recognised that by combining certain of the features of combinatorial chemistry with certain features 
of SBDD one can produce a drug discovery system in whk:h only a relatively limited compound library need be generated 
before a range of active compounds is identified, that that range of active compounds may provkie sufficient structure- 
activity relationship information for a lead candkiate to be kientified with relatively little iteration (ie. relatively little 
^ extension of the library that Is initially generated eind tested), and that the library may be generated on rational principles 
ensuring that the vast majority of compounds in the library may be synthetically readily accessible. 

In other words, in using the process of the invention to generate the structure activity information necessary to 
identify a lead candidate one may avokj the need to make and test the large compound libraries required by prior art 
routine screening or by combinational chemistry and, unlike prior art SBDD technques, the active compounds identified 
50 will implk:itly be synthetically readily accessible. 

Thus viewed from one aspect the inventbn provides a process for drug candidate identification, said process 
comprising the steps of: 

(1) obtaining a computerised representatkxi of the three-dimensbnal structure of a binding site on the surface of 
55 a biological nnacromolecule; 

(2) generating a computerised nrodel of the functional structure of said binding site whk:h may be used to kientify 
favourable and unfavourable interactions between the binding site and a drug candkjate nnolecule; 
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(3) identifying a molecular fragment (or 'template' T) capable of placement within said binding site and capable 
of carrying at least one (preferably a plurality (ie. at least two) and especially preferably at least 3) substituent 
group, said molecular fragment either being capable of being synthesized from reagent compounds accessible in 
substituted form whereby to import said substituent groups on synthesis of said rrralecular fragment or being present 

s in an accessible reagent compound capable of substitution with said substituent groups by reaction with further 

accessible reagent compounds; 

(4) generating a set of lists of accessible reagent compounds (eg. a^-A, A. aQ-A, etc, b^-B, b2-B, b3-B. etc, c,- 
C. C2-C. C3-C, etc), the lists being such that a combination of compounds taken from each list (eg. a^-A. bs-B and 

10 c^i -C) may be reacted to produce a caruJidate compound comprising said molecular fragment carrying a plurality 

of substituent groups (eg. a, c^^T) thereby generating a first virtual library of candidate compounds being the 
theoretical set of compounds producible by reaction of the members of said lists (ie. a^b^c-,!. a^b^C2T, a^bgC^ T 
etc), each member of each list comprising a cc^ponent (eg. A, B.C. etc.) common to the other members of that 
list and a component (eg. a^. b^. c^. etc) unique within that list; 

15 

(5) for each said list limiting the nunnber of members thereof using a first set of exclusion rules thereby to generate 
a restricted second virtual library of candidate compounds, the operation of said first set of rules involving for each 
member of each list computerised comparison for favourable or unfavourable interactions between said compu- 
terised nrKxiel and a structure comprising said molecular fragment and a substituent deriving from the unique 

20 component within said list of that member, the rTx>lecular fragment and the computerized model being held in fixed 

spatial relationship to each other for said comparison; 

(6) evaluating and ranking by computer the members of said second virtual library for favourable and unfavourable 
interactions with said computerised model and thereby generating a restricted third virtual library of candidate 

25 compounds ranked as having favourable interactions; 

(7) optionally, selecting frorn.said third virtual library at least one further molecular fragment and repeating steps 
(4), (5) and (6) to generate an alternative third virtual library; 



^ (8) screening said third virtual library using a second set of exclusbn rules thereby to generate a restricted fourth 
virtual library of candkiate compounds comprising compounds whbh are candidates for synthesis and experimental 
evaluatbn for drug efficacy; 

(9) synthesizing some or all candidate compounds of said fourth virtual library to produce a candidate compound 
35 library; 



(10) experimentally evaluating the compounds of said candidate compound library for drug efficacy; 

(11 ) analysing the experimental efficacy data generated in step (10) for structure-activity relationship information; 

40 

(12) using the informatbn derived in step (11) selecting a revised set of lists of accessible reagent compounds, 
said lists being expanded to include selected reagents not present in the restricted lists generated in step (5) and 
optionally restricted to exclude selected reagents present in the restricted lists generated in step (5); 

45 (1 3) repeating steps (6) and (7) to identify further compounds which are candidates for synthesis and experimental 
evaluation for drug efficacy; 

(1 4) synthesising and experimentally evaluating sakJ further compounds for drug efficacy; 

so (15) if required repeating steps (11) to (14) one or more times; 

(16) kjentifying as a lead candidate a compound synthesized and experimentally evaluated as atx}ve. 

Viewed from an alternative aspect the invention provides a method manufacturing a drug substance, sad method 
55 comprising the steps of: 

(1) obtaining a computerised representatnn of the three-dimensional structure of a binding site on the surface of 
a biological nr^acromolecule; 
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(2) generating a computerised model of the functional structure of said binding site which may be used to identify 
favourable and unfavourable interactions between the binding site and a drug candidate molecule; 

(3) identifying a molecular fragment capable of placement within said binding site and capable of carrying at least 
one substituent group, said molecular fragment either being capable of being synthesized from reagent compounds 
accessible in substituted form whereby to import said substituent groups on synthesis of said molecular fragment 
or being present in an accessible reagent compound capable of substitution with said substituent groups by reaction 
with further accessible reagent compounds; 

(4) generating a set of lists of accessible reagent compounds, the lists being such that a combination of compounds 
taken from each list may be reacted to produce a candidate compound comprising said molecular fragment carrying 
a plurality of substituent groups thereby generating a first virtual library of candidate compounds being the theo- 
retical set of compounds producible by reaction of the members of said lists, each member of each list comprising 
a component common to the other members of that list and a com(>onent unique within that list; 

(5) for each said list limiting the number of members thereof using a first set of exclusion rules thereby to generate 
a restricted second virtual library of candidate compounds, the operation of said first set of rules involving for each 
member of each list computerised comparison for favourable or unfavourable interactions between said compu- 
terised rrxxJel and a structure comprising said molecular fragment and a substituent deriving from the unique 
component within said list of that member, the molecular fragment and the computerised model being held in fixed 
spatial relationship to each other for said comparison; 

(6) evaluating and ranking by computer the members of said second virtual library for favourable and unfavourable 
interactions with said computerised model and thereby generating a restricted third virtual library of candidate 
compounds ranked as having favourable interactions; 

(7) optionally selecting from said third virtual library at least one further molecular fragment and repeating steps 
(4). (5) and (6) to generate an alternative third virtual library; 

(8) screening said third virtual library using a second set of exclusion rules thereby to generate a restricted fourth 
virtual library of candidate compounds comprising compounds which are candklates for synthesis and experimental 
evaluation for drug efficacy; 

(9) synthesizing some or all candidate compounds of said fourth virtual library to produce a candidate compound 
library; 

(10) experimentally evaluating the compounds of said candidate compound library for drug efficacy; 

(11 ) analysing the experimental efficacy data generated in step (10) for structure-activity relationship information; 

(12) using the informatbn derived in step (11) selecting a revised set of lists of accessible reagent compounds, 
said lists being expanded to include selected reagents not present in the restricted lists generated in step (5) and 
optionally restricted to exclude selected reagents present in the restrrcted lists generated in step (5); 

(1 3) repeating steps (6) and (7) to identify further compounds which are candidates for synthesis and experimental 
evaluation for drug efficacy; 

(1 4) synthesislng and experimentally evaluating said further compounds for drug efficacy; 

(15) if required repeating steps (11) to (14) one or more times; 

(16) identifying as a lead candidate a compound synthesized and experimentally evaluated as above: 

(17) manufacturing the compound identified in step (16) above; and, optionally, 

(16) admixing the compound manufactured in step (17) above with at least one pharmaceutically acceptable carrier 
or excipient. 
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By "accessible* it is meant that the reagent compounds are commercialty available or are synthesizable, preferably 
via relatively simple routes, from commercially available materials, eg, using known synthetic procedures. 

By a Virtual library" is meant the set of compounds theoretically attainable by the inter-reaction of the reagents in 
the reagent lists for that library. By contrast by "compound library" or "candidate compound library" is meant a library 
5 of compounds that have been synthesized. 

In the process of the invention, the reagent lists in the first iteration can conveniently be limited to select only one 
reagent of any group in which the unique substituents (ie. a^. ag. a3 etc) are closely analogous (eg. optical isomers, 
alkylene chain length homologs. equivalentty substituted compounds (eg. compounds substituted by different halo 
atoms), etc). Similarly, while the reagent lists may initially include both commercially available compounds and trans- 
10 formation products of such compounds, for the first iteration (eg. for the first performance of step (6) for a given template) 
the lists may be limited to exclude reagents which although commercialty available have a long delivery time or are 
particularly expensive as well as transformations which although synthetically feasible are complex, have poor yields, 
or require significant purification, etc. In the second and further iterations however such analogs or less readily available 
compounds can be reinstated within the reagent lists where the structure^ctivity information derived in step (11) sug- 
15 gests that they may also lead to effective compounds. 

In this way the process of the invention may be carried out in such a way that expensive reagents or complex 
synthetic chemistry is only required at a stage where candidate compound efficacy is already established and the drug 
discovery process is closing in on a lead candidate. 

Nonetheless, by including within the reagent lists not just compounds which are readily available commercially but 
20 also chemical transformation products of available reagents, the theoretical size of the virtual libraries is vastly expand- 
ed and the capability of the process to assess (and if necessary readily synthesize and test) modifications of the 
compounds revealed to be active is greatly increased. 

Computational requirements may also readily be reduced by limiting the precision of the computer evaluation and 
ranking of the virtual libraries for the initial performance of steps (5) and/or (6). eg. by limiting the conformational 
•25 freedom of the compound under evaluation or by specifying the position and orientation of the compound within the 
binding site model. In this way one may only bring in more complex and accurate evaluation systems for subsequent 
iterations or for sets of library member which would require the use of expensive or less readily accessible reagents 
for their production. Indeed, this combination of SBDD concepts to screen a virtual library of accessible compounds 
before synthesis and experimental evaluation of the resulting limited real library may even render the use of highly 
30 computationally-demanding programs unnecessary - ie. a "quick and dirty' computational evaluation may be entirely 
adequate. 

If desired, using optbnal step (7) in the process of the invention one may refine the selection of the template. Thus 
for example, step (3) of the method of the invention may be effected by selecting as the template an "anchor" group 
(Anch) suitable for binding to a section of the binding site, eg. an aryl or other lipophilic group capable of binding to a 

35 lipophilic region of the binding site, and performing steps (4) to (6) to identify a further set of templates (T„) which can 
be coupled at a substitution site to the anchor group thereby to generate a virtual library of anchor-template (Anch-Tn) 
compounds, and to evaluate and rank by computer the members of the anchor-template library for favourable and 
unfavourable interactions with the computerised nrKxiel (the Binding Site Model) whereby to produce a restricted set 
of templates ranked as having favourable interactions, one, more or all of which may serve as the molecular fragment 

40 for the reiteration of steps (4) to (6). In this way, by a rational selectbn of the template, the success rate tor the sub- 
sequent synthesis and testing of the candidate compound library, and hence the predictive value of the structure-activity 
information derived therefrom which is the hallnnark of the process of the present invention, is still further enhanced. 

Indeed the process of the invention is characterised by the rapid generation of a relatively small set of readily 
synthesisable candidate compounds with a high success rate in terms of drug efficacy and hence a high predictive 

^ value for directing subsequent iterations. Unlike standard combinational chemistry the technique is not "blind" relying 
on random success in the testing of a large library and unlike SBDD the technk^ue inherently produces compounds 
which are readily synthesizable and readily modified to home in on a lead candidate. 

While the rational selection of a template starting from a list of anchor-template candidates as discussed above is 
a preferred aspect of the inventk^, template selectksn may be on the basis of knowledge of the macromolecular target 

so or of knowledge of compounds known to bind to or otherwise influence the target. By way of example a template may 
be designed to mimic the structure and conformation of a compound known to have the desired drug efficacy while 
itself having a structure that is accessible either by combinatbn of two or more reagents having the "unique/common" 
component structure referred to above or directly as a readily substitutable reagent compourxi (a template reagent). 
Once the template or templates have been selected, the "common" components of the reagents are implicitly 

55 kJentified either as functional groups that will react with groups on a template reagent or as groups which will react 
together to generate the template. The reagent lists may then be generated by searching a computer database of 
available chemicals (such as the ACD database of MDL Informations Systems Inc.). These lists may then be supple- 
mented by inclusk>n of accessible transformations of the compounds on the lists, eg. oxidation, reduction or substitution 
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products of such compounds as well as salts, esters, and other feasible chennical transformations. If desired, the mem- 
bers of the resulting lists may be grouped into groups deemed likely to have similar properties in any resulting candidate 
compounds and also ranked in terms of accessibility (ie. expense, delivery delay, complexity of any required transfor- 
mation, etc). 

5 The limiting of the lists of reagent compounds in step (5) may conveniently, as discussed above, comprise restric- 

tions of groups of compounds deemed analogous, elimination of tow accessibility compounds, elimination of com- 
pounds with too high a molecular weight too large a total atom count or with substituent groups thought to possess 
undesired properties (eg. charge or reactivity with other groups such that template productton may be hindered). 
The computational comparison used to further limit the lists may be effected on a list by list basis with for each list 

10 the selected members of the other lists remaining constant or preferably with the other substituent posit tons on the 
template being vacant. Where the comparison does not involve such vacant sites these "selected members" may be 
chosen on the basis of perceived compatibility with the binding site but conveniently once the first list (preferably the 
shortest) has been evaluated, a highly compatible member of that list will be the invariant selected member for eval- 
uation of the next list and soon. Advantageously, once highly compatible members of the other lists have been identified. 

IS the first list will be reevaluated with highly compatible members of the other lists being the invariant selected members. 
Alternatively and much more preferably, the computational comparison may be significantly simplified by prese- 
lection of one (or more if necessary) invariant locations and conformations for the template within the binding site 
model, followed by comparison, on a list by list basis for individual members of the lists and for an incompletely sub- 
stituted template, eg. the template cariying oily the substituent(s) deriving from the individual list member being in- 

20 vestigated. In this way, alternative orientations of a list member which satisfies basic requirements such as appropriate 
size and functionality (eg, charge, lypophilicity. hydrogen bond donor/acceptor, etc.), may be scrutinised to improve 
the predictive value of the ranking which is the result of the comparison. 

For step (1 ) of the process of the invention, one can conveniently input 3-D structural information about the binding 
site (eg. X-ray crystallographtc analyses) from published sources, preferably sources whtoh are computer-accessible. 

25 The binding site model generated in step (2) conveniently consists of a representation of those regions of the 

-binding site that can be considered capable of molecular interact ton with a xenobiotic or other molecule, labelled ac- 
cording to the nature and geometry of the possible interactions, eg. hydrogen bond donor sites, hydrogen bond acceptor 
sites, aliphatic and aromatic lipophilic sites, ionic and metal-binding sites, etc. 

30 DETAILED DESCRIPTION OF THE INVENTION 

Briefly put, the process of the invention involves the following steps: 

Constructton of a virtual combinatorial library based around a template chemistry considered appropriate for the 
35 target molecule and amenable to combinatorial synthesis 

Screening of members of the library based on their interaction with a target receptor 

Synthesis and testing of representative elements of the library as single compounds using a variety of synthetic 
^ protocols. 

With a known target molecule structure, this process can be done efficiently and accurately and overcomes various 
difficulties associated with applying combinatorial chemistry or Stnjcture-Based Drug Design. 

Firstly, the process offers all the advantages of an array based combinatorial library (single compounds.-wider 
^ variety of chemistries) whilst sidestepping the problem of small library arrays. This is because a very large virtual library 
is considered and screened computattonally, leaving only a small number of compounds to be synthesised and tested. 

Secondly, the need to have one synthetto protocol to cover a wide variety of chemistries is relaxed. The synthesis 
route can be tailored to accommodate a larger range of chemistries than could be considered by an automated method. 
Solutton and solid phase methods can be used with protection and deprotection steps as required. This means that a 
50 larger virtual library can be considered and thus the chance of locating active compounds is increased. The process 
also altows for simple f uncttonal group transformations within the starting materials to increase further the diversity of 
the virtual library. 

Thirdly, by restricting the design process to molecules which are accessible by specified synthetic routes, one 
avoids the problems often associated with rattonal drug design, ie. uncertain synthetic feasibility and stow feedback 
55 between design and experiment. The process yields a set of compounds based around a common template which can 
be rapidly synthesised and assayed for activity against a given target. Such a set of compounds might form an imme- 
diate OSAR (Quantitative Structure Activity Relationship) training set. in contrast to other drug discovery paradigms 
where further work wouto be necessary to derive an equivalent QSAR set. A prirriary advantage over a traditional 
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medicinal chemistry approach, which would involve obtaining a lead by a screening process artd then synthesising a 
large number of analogues to provide an SAR set. is thus one of cost-effectiveness. 

In the process of the invention, each member of the virtual library consists of a common template with different 
substituents attached to it. The substituents are derived from accessible chemical reagents and it is variation in these 

5 substituents at each template attachment point which causes the combinatorial explosion In the number ol Individual 
molecules in the library. The library has a synthesis route or strategy (sometimes referred to herein as the template 
chemistry) associated with it whereby Individual members are synthesised from available chemical reagents. The tem- 
plate itself can be an available chemical or can be formed during the chemical reactions (e.g. in ring forming reactions). 
The available reagents may also undergo general molecular transformations before they are attached to the template 

10 and become substituents. 

The technological aspects of the process may be separated into four stages. The design specification stage de- 
termines the constraints which are to be applied and explored during the computational screening of the virtual library. 
Obviously, these constraints include the actual specification of the library, the 3D structure of the receptor and any 
specific constraints derived from the receptor to judge the quality of library members. The second stage involves se- 

'5 lecting cherhical reagents and screening the corresponding substituents which are used to form members of the virtual 
library. The substituent screening is based on the structure of the receptor. The accepted substituents are further 
assessed and filtered using a variety of computer-aided techniques and chemical considerations. The third stage in- 
volves enumeration of the virtual library, i.e. production of the full library after all rejected substituents have been 
deleted. The final computational stage of the procedure Is to perform simple checks and calculations on the enumerated 

20 library and arrive at a ranking of the molecules in the virtual library for synthesis and testing. 
Each of these four stages will now be outlined in detail. 

The three aspects of design specificatk>n which are discussed below are template selection, template positioning, 
and the design criteria which will be discussed separately despite being inter-retated. 

Specification of the design criteria involves careful study of the target macromolecule. Thus, decisions need to be 

25 taken at this stage about which X-ray structure(s) of the receptor are to be used (if more than one is available), and 
whether some refinement by molecular dynamics/molecular mechanics needs to be carried.out in order to generate a 
more accurate starting point for molecular design. Typically more than one snapshot of the receptor structure will be 
used in successive experiments. Also it is necessary at this stage to decide on the key functtonalrties in the active site 
with which the substituents on the candidate compounds are to interact. A 'design model' is then generated for each 

30 template attachment point, eg, using the Design Model Generatk>n functionality of PRO_UGAND (see Clark et al. 
(supra)). A design model consists of a number of interaction sites which originate from specified receptor atoms and 
may be either vectors (denoting favourable positions and directions for hydrogen bond interactions with the active site) 
or points (denoting positions of favourable lipophilic contact with the active site) (see Bohm. J. Comput. -Aided Mol. 
Design 6: 61 and 593 (1992) and Klebe, J. Mol Bbl. 237: 212 (1994)). The vectors and points are labelled to indicate 

35 the particular chemistry they represent; thus D-X and A-Y vectors represent potential hydrogen bond donor and ac- 
ceptor positions respectively. Similarly, L and R sites represent aliphatic and aromatic lipophilic sites respectively The 
density, positrans and^orientations of the interaction sites are encoded in a rule-base which can be edited by the user 
and is based on a statistical examination of experimentally preferred intermolecular contacts (see Klebe (supra)). 
The purpose of the molecular template is to hold in position the Substituents which will make hydrogen bonds, 

40 lipophilk; contacts or other favourable interactions with the binding site. An advantage of using structural information 
in the chokie of the template chemistry is that knowledge of the receptor can be used to increase the chances of the 
library containing active molecules. A number of important issues can be kjentified in the selection of the template 
chemistry. 

4$ - The synthetic chemistry associated with the template should be relatively accessible and capable of delivering a 
wide diversity of substituents at a number of attachment points. 

Ideally* the template itself shoukt be capable of making a number of favourable contacts with the receptor. This 
aids in establishing the position of the template and increases the likelihood that the library will contain active 
50 molecules. 

I n some cases it is possible to infer likely templates from known Inhibitors or substrates. For example, in the search 
for a thrombin inhibitor a known inhibitor of thrombin is PPACK (D-phenylalanyl-prolylarginyl-chloromethylketone) 
which contains a central proline moiety which coukf be used as a template, or one could choose a known sub- 
55 structure with strong binding (e.g. guankJinium in the 81 pocket of thrombin) which can be pre-positioned and used 
to search for potential templates (ie. using the 'anchor* techn'que refen-ed to above). 

Templates can be designed de novo, using structure-based techniques. This could mean using a de novo design 
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method or a receptor-based database screening strategy. It is also advantageous to search reaction databases 
for suitable ring forming reactions which, for example, would give rise to beta-sheet mimetics. 

It is desirable that the template has restricted conformational freedom so that onf^ limited numbers of alternative 
positions for the template need be considered. 

The process of template selection may thus involve close collaboration between modellers and synthetic chemists, 
the former providing expertise about the requirements of the templates in terms of molecular interactions at the binding 
site and the latter giving guidance concerning the synthetic feasibility of any choices made. The result of the template 
selection process is a set of scaffolds chosen to achieve the best architecture in the active site and to minimise the 
synthetic effort required to prepare them. In practice, the decision about which templates to pursue will be a balance 
between the variety of factors discussed above. 

It should be emphasized that unlike many of the combinatorial chemistry techniques described hitherto, it is not 
necessary that the template or the candidatexompounds be peptides or peptoids. 

Having chosen the set of templates to be used in the active site of interest the next task is to position the templates 
appropriately within the site. In principle, there will be a very large number of orientations of a given template in the 
site (although this number can be reduced if the chosen template makes a specific interaction with the binding site 
itself). What is required is to select a subset of these positions which place the template in such a way as to facilitate 
the molecular interactions that will be formed by the substituents once they are attached. 

This placement process could be achieved automatically by means of various objective docking protocols based 
on molecular mechanics or empirically based energy calculations (see Blaney, Perspect. Drug Disc. Des. 1^:301 (1993)) 
or geometric positioning upon interaction sites (see Bohm. J. Comput.-Aided Mol. Des. 8:623 (1994)). The result of 
template positioning is a position, or number of positions, in 3D coordinate space for each of the templates. The chosen 
orientations are saved for future reference. 

The process of substituent selection involves and/or other chemical compound databases a number of steps 

Searching the Available Chemicals Directory (ACD) of MDL Information Systems Inc., San Leandro, California, 
US (and/or other chemical compound databases) to find potential substituents for a given template 

Computationally screening these potential substituents, eg. using techniques such as those used in the de novo 
design program, PRO.LIGAND (see Clark et al. (supra)) 

Assessing and deciding on the preferred substituents at each position. . 

Each of these steps is explained more fully below. It is important to realize that substituents attached to different 
attachment points are preferably tested independently of each other at this stage. This makes the process of performing 
detailed 3D checks on a large virtual library computationally efficient. This and other approxinoations inherent in this 
approach are discussed below. 

Given a positioned template, it is possible to infer for each template attachment point the nature of the interaction 
(s) the corresponding substituent is to make with the active site (eg. hydrogen bond, lipophilic contact, etc.). the nature 
of the functional group required for a coupling reaction to the template (eg. acid chkMide with a primary amine) and a 
distance range between the point of attachment to the template and the point of interactkm with the active site. 

These two (or nrtore) substructural criteria with the associated distance rang6(s) constitute a viable query for a 3D 
database search using database searching toots, such as ISIS/3D-from MDL lnfomr)atk>n Systems Inc.. Unity from 
Tripos Associates Inc.. and Chenrv3D from Chemical Design Ltd.. etc. The query can be made more sophisticated 
through a consideration of potential nrtolecular transformatbns. or through the imposition of synthetic constraints on 
altowed chemistries in specified substructures. By using the ACD. the chance that all chosen substituents will be com- 
mercially available is maximised. In general, the search carried out should explore the conformatbnal flexibility of the 
database molecules to ensure that as many as possible of the potential substituents at each position will be retrieved. 

For each temptete attachment point a file of potential substituents may be saved as 2D structures to a file, eg. in 
MDL's SD fomnat (see Dalby, J. Chem. Inf Comput. Sci. 32; 244 (1992)) and then the Converter program (available 
from MSI. San Deigo. California, US) may be used to add the necessary hydrogen atoms and generate 3D coordinates 
for the structures. 

The methods used for the computatk)nal screening of potential substituents may conveniently be technques such 
as those used in the de novo design package PRO.LIGAND (see Clark et al (supra)). As described eariier. each 
template attachment site has its own design model and the template attachment sites themselves are appended to 
the design models, according to the lat>els specified in the template file which is input to the program. By automatk:ally 
labelling the potential substituents for each template attachment positkDn with appropriate interaction link sites, it is 
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possible to use rapid algorithms to establish whether they can form good molecular interactions with th active site. 
For more details, see Clark, J. Comput -Aided Moi. Des, 9:13 (1995) and Murray. J. Compul. -Aided Mol. Des. 9:381 
(1995). 

The flexibility of this approach is enhanced by the ability to detect specified functional groups and replace them 
5 with another group. This increases the diversity in the virtual library that is computationally screened and so increases 
the chance of finding active compounds. The computational deprotection of protected functional groups is one example 
of how this feature might be used. 

The molecular transformation may be controlled by rules containing a SMILES-like notation (see Weininger. J. J, 
Chem. Inf. Comput Sci, 28:31 (1988)) for the substructures together with a number of integers. Thus, for instance, 
'0 one rule may indicate that up to three silyl ethers are to be replaced by hydroxyl groups in any molecule. The geometry 
for the transformed part of the molecule is rebuilt atom by atom using a rule-based procedure and then relaxed with a 
molecular mechanics minimisation. 

A similar approach is used to protonate or deprotonate certain functional groups specified by the user in order that 
the molecules to be placed in the active site have realistic protonation patterns. Once the rTK)lecules have been sub- 
'5 jected to all the transformatrons requested by the user, they are passed on to the Initial molecular property screens. 

Before subjecting the potential substttuents to more computatksnally demanding subgraph isomorphism and di- 
rected tweak checks, some rapid molecular property screens may be used to eliminate unsuitable structures. Thus, 
acceptable ranges may be set for a number of properties, eg. 

so - Molecular weight 
Number of atoms 

- Log P (eg. calculated using the method of Viswanadhan, J. Chem, Inf. Comput. Sci. 29: 163 (1 989)) 

25 

Number of rotatable bonds 

Any substituents which fall outside the acceptable ranges may thus be automatically rejected. This is useful, for 
example, when the database entry contains more than one component. The program should automatically separate 

50 the components and treat each one as separate substituent. The screen based on the number of atoms terKJs to remove 
the undesirable component which is often a counterion. The code can also screen out duplicates. 

A further initial screen on substituents may be empk>yed for some complex template chemisties. Thus for example, 
if in a ring forming reactkxi one chemical reagent gives rise to two substituents on the template, then the two corre- 
sponding template attachment points will have the same list of available chemicals associated with them. Specific 

35 checks shoukj ensure that only chemical reagents which have provided a substituent to pass all screens for the first 
template attachment point are considered for the provision of substituents for the second template attachment site. 

The first step in subgraph isomorphism matching process is to label the potential substituent with the appropriate 
interaction sites. This is accomplished by means of a rule-based procedure where each rule denotes a substructure 
in the SMILES-like notation mentioned earlier ar^J indicates if and how each of the atoms in that substnjcture shoukJ 

40 be labelled. Thus for example, one rule might instruct the program to search the substituent for any matches to a 
particular specified substmclure (eg. C(=NH)N(H)H) and to label the second and fourth atoms of the match as X sites 
and the third and fifth atoms of the match as D sites. A powerful regular expression-based syntax is available within 
the SMILES-like notation which permits very flexible definitions of the rules; for instance, a further rule might indicate 
that any OH or NH group attached to a cartx)aatom should be labelled as a donor group. 

^ In addition to the interactk>n sites described earlier, it is also desirable to label each substituent with link sites. 
These denote the vector site in the structure where the potential substituent will join to the template. The link sites are 
assigned in an identical manner to the interaction sites. Thus, for instance, a further rule might instmct the program to 
label the C-C bond in a CCO2H substructure as a link site (link site vectors are denoted V-W). (Note however that a 
link site does not have to correspond to an attachment point in an actual chemical reaction; for example, the formation 

so of a peptide bond may be the chemical reaction associated with substituent attachment, but by defining the template 
to already contain the peptide bond, the C-C bond can be used as the computational link site. The chosen definition 
is dictated by convenience or computatk^nal efficterKy. although in templates derived from ring forming reactions, it is 
often essential to choose link sites whrch do not correspond to the bond formed in the chemical reactksn.) 

If, for any reason, a potential substituent canrK>t be assigned either interaction sites or link sites, it is automatically 

55 rejected. Othenwise, the program shoukJ proceed to seek a 3D match between the interaction/link sites of the substituent 
and the interaction/link sites of the design model. This may be accomplished using the subgraph isomorphism algorithm 
of Ullmann (see J. ACM 23:31 (1976)) whrch has been used successfully in many chemical structure applications. In 
order to account for the conformatk>nal flexibility of the substituents in this process, distance bounds matrices are 
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calculated using the directed tweak routines which seek to establish the nnaximum and minimum distances that can 
be attained between all pairs of atoms through rotation about rotatable bonds (see Murray, J. Comput.-Aided Mol. Des. 
9:381 (1995)). The subgraph isomorphism algorithm then uses these distance ranges in establishing a match in the 
manner described by Clarke J. Mol Graphics JO: 1 94 (1992). 

If no match is found for the substituent, it is rejected and the algorithm returns to consider the next available 
substituent. 

The finding of a match for a substituent in the subgraph isonnorphism check is not necessarily a sufficient condition 
for a substituent to be accepted. This is because the distance bounds nr^trix does not include correlation effects, i.e. 
the effect that one interatomic distance having one value might have on the possible values attainable by the other 
interatomic distances. Thus, in order to establish whether the substituent is in fact a viable one for the template attach- 
ment point in question, a specific matching conformation should be generated using some form of conformational 
exploration procedure (see Clark. J. Mol. Graphics^: 194 (1992)). 

The procedure adopted for the experimental trials reported betow is based on the directed tweak algorithm (see 
Hurst, J. Chem. Inf. Comput. Sci. 34:190 (1994)) which was originally developed for 30 database searching applica- 
tions, where it has been shown to be both efficient and effective. Its utility in the field of de novo design has recently 
been demonstrated (see Murray, J, Comput.-Aided Mol. Des. 9:381 (1995)). 

The directed tweak algorithm takes the match established by the subgraph isomorphism algorithm and then seeks 
to verify it by perfomning a torsional optimisation of the rotatable bonds in the substituent. After a potential match has 
been kx:ated, the substituent is attached to the template. The bond where attachment occurs is treated as rotatable. 
The following cost function is minimised by a steepest descent method: 

1 

F = S aidi^ 

/n ^ 

where the summation occurs over all N interactbn sites, and d: is the distance between the ith substituent interaction 
site and the design nrKxJel interactkxi site with which it is matched, a; is a coefficient whk:h depends upon the type of 
interaction site being nnatched and is a simple function of the tolerances used in the subgraph isomorphism algorithm. 
This cost f unctk)n differs from that used by Hurst (J. Chem. Inf Comput. Sci. 34: 1 90 (1 994)) and Murray (supra), in that 
the distances between pairs of sites are not included, only the absolute distance between the two matched sites. This 
is possible because the template attachment site provkies a fixed point of reference in the design nrKxIel coordinate 
space. This means that there are fewer temris in the cost f unctksn expressbn and it is tikety that the simpler expression 
has fewer local minima. There is also no need to check the chirality of the conformations produced. These advantages 
make the approach considerably faster. 

After minimisation, the conformation is accepted if it passes the following criteria. The value of the cost function 
must be less than a user defined maximum (typically about 0.5~A2)7and the substituent must not be clashing with the 
receptor, with the template or with itself. If the conformatkxi fails these checks, the tweak routines are used to find an 
altemativeconformatk3n - the procedure is repeated until an acceptable geometry is located or a user definable number 
of attempts has been exceeded (see Murray (supra)). 

The substituent still attached to the template is then optionally minimised using a molecular mechanics energy 
functbn. This is done in the presence of the receptor (which is treated as rigid) and a cut-off on the bng range terms 
of BA Is usually applied. An estimate of the strain energy in the receptor-bound conformatbn is obtained by performing 
the minimisation (startlngirom the tweak-generated geometry) in the absence of the receptor and subtracting from this 
energy, the intramolecular energy of the receptor-bound conformation. During these calculations, the template part of 
the nrK)lecule is hekJ rigki. All molecular mechanics calculations may be done employing the fast and approximate 
'Clean' forcefield developed by Hahn (J. Med Chem 38:2080 (1 995)). Partial charges are calculated using the method 
of Gasteiger and Marsili (see Tetrahedron 36:3219 (1980)). The Clean forcefield bears many similarities to the 'gen- 
eralised atom* forcefield Incorporated in the Chem-X software (available from Chemical Design Ltd, Chipping Norton, 
UK) in that it does not rely on extended forcefield atom types. Only element type, hybridisation and bond type are used 
in cak:ulating the energy of a system (see Hahn (supra)). A number of minor adjustments may be made in the imple- 
mentation of the forceftekj. The first is that all hydrogen atoms are treated specifically and are assigned an sp^ atom 
type. The second is that van der Waals' radii for potential hydrogen-bond-forming atom pairs are scaled, typically by 
0.8. It shouki be realised that the purpose of the Clean forcefield in the process of the invention is to provide a rough 
clean up of the substituents, which may possess distorted geometries caused by unrealistic torsion angles. The force- 
field must be robust, in the sense that it must be able to cope with any chemistries that are given to it. and this is why 
a generalised atom forcefield is the nx>st obvbus choice. Additionally it must meet thb approximate accuracy criteria, 
and in this context, the accuracy of the intemrtolecutar terms is important It was after analysis of intermolecular ge- 
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ometries obtained using Clean that the scaling of the hydrogen bonding van der V\^als' radii was introduced. It is 
believed that the forcefield produces improved and reasonable geometries - at least when some portion of the nnolecule 
(here the template) is held fixed in the receptor 

The minimised conformation of the substituent (still attached to the template) is then assigned a scor using a 
5 scoring function developed by Bohm for use in the de novo design program LUDI (see J. Comput.*Aided Mol. Des. 8: 
243 (1994)). Bohm's scoring function permits an approxinoate calculation of the binding free energy of the substituent 
and template in terms of readily calculable quantities such as lipophilic contact surface area, the number and quality 
of hydrogen bonds fomied and the number of rotatabla bonds. Following Bohm, the form of the equaton used is 

hbonds 

t 

IS where 



f(AR.Aa) = f1(AR)f2(Aa) 



20 and 



2$ 



( 1 

fKAR) = ( 1 
{ 0 



AR i O.2A 
- (AR-0.2)/0-4 AR < 0.6A 

AR > 0.6A 



and 



30 

(1 Aa < 30** 

f2(Aa) = ( 1 - (Aa-30)/50 Aa 80' 
(0 Aa > 80'' 

35 

f(AR, Aa) is a function which penalises hydrogen bonds whose geometry deviates from ideality. AR is the deviation of 
the H ... O/H hydrogen bond length from 1.9A; Aa is the deviation of the hydrogen bond angle N/O-H... O/N from its 
ideal value of 180**. AGq is a contribution to binding energy which is independent of interactions with the protein. Bohm 
suggests that this nnay be rationalised as a reduction in binding energy due to loss of translational and rotational entropy 

40 of the ligand, AG^b describes the contrtoution from an ideal hydrogen bond and ^Gionw contribution from an unper- 
turbed ionic interaction. AGgp^ denotes the contribution from lipophittc interactions which is assumed to be proportional 
to the lipophilic contact surface between tigand and protein. P^^. Finally, AG„, describes the loss of binding energy 
due to the freezing of internal degrees of freedom in the ligand. N^qi is the number of acyclic sp^-sp^ and sp^-sp^ bonds 
excluding4he rotations of terminal methyl and amine groups. 

45 The values used for the various coefficients are those adopted by Bohm for the LUDI program: AGq = 5.4« AG^^ 

= -4.7, AGjo^jc = 3, AG|po r •O. 1 7 and ^O^x = 1- 4. The coefficients were obtained by fitting the equation to the activities 
for iigand-receptor binding where crystallographic structures for the complexes were available (although a few ge- 
ometries were obtained by docking the ligands into the receptor). The accuracy of the function is not expected to be 
better than 1 .5 orders of magnitude in the binding affinity. 

so Using this scoring function, it is possible to rank the accepted substituents according to the strength of interaction 

they are likely to make with the receptor by subtracting the pre-computed score for the template from the total score 
for the template-substituent conrtbinatbn. 

Since the first-found conformation is not necessarily the highest scoring one available to the substituent. a user- 
specified number of acceptable conformations (typically 10 or more) will be sought and scored. After these conforma- 

55 tions have been examined, the substituent geometry with the highest score is saved for future reference. 

Once potential sut)stituents have been k)cated for each template attachment point, one can autonnaticalty enu- 
merate the possibilities to produce the full library for all combinations of those sut»stituents and the template. However, 
it is usually advisable to conskter th substituent lists further so as to reduce the size of the enumerated library 
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The output thus far for each template attachment point is a directory of substituent files each containing scoring 
and database information. A directory of substltuents, the receptor structure and the template molecule are read in to 
a graphical visualisation packag . This package may be designed to allow the user to scroll quickly through the sub- 
stituent list in any order whilst displaying the file name and any molecular properties that are present in the substituent 

5 files (e g the strain energy, the Bohm score or the components of the score). The properties may be displayed in a 
spreadsheet running alongside the molecular visualisation. Substituents can be visualised in isolation, with the template 
or with the receptor staictur . 

A set of substituent structures is treated as a list on which operations can be performed by the user For instance 
one would probably want to store all structures with Bohm scores less than a given value in a new list of 'Good Scores'; 

10 one might also want to exclude all structures with high strain energies, and possibly remove bad structures judged by 
more subjective criteria (e.g. bad chemistries or geometries). The user can have full control over which list of structures 
are displayed. At any time the user can write a list to a new or o\6 directory or remove a list from an okJ directory. 

Coupled to the list functionality is a clustering facility which allows one to cluster a specified list on the basis of 20 
chemical turK:tionality. The clustering may be based on similar functionality available in PRO_LIGAND which measures 

IS similarity by Tanimoto coefficients derived from bit string representations of the chemical structures (see Willett. J. 
Ghem. Inf. Comput. Sci. 26:109 (1986) and Barnard. J. Chera Inf. Comput. Sci. 32:644 (1992)). The bit strings may 
be specified by 1 72 atom-centred fragments generated from an analysis of 5000 structures in the Cambrkige Structural 
Database (see Allen, J. Chem. Inf. Comput. Sci. 31:187 (1991)). Several different clustering algorithms are available, 
and one may use a hierarchk^al clustering method such as Complete Linkage or Ward's. (The number of structures 

20 clustered may typk:alty be about 100 or less, so CPU time is not an issue.) A number of toots are available to help 
decide on the appropriate number of clusters for the specified lists. The output from the clustering is a new set of lists 
each containing an individual cluster. These can be browsed and operated on as described above. Whilst the clustering 
is not always perfectly in line with chemical Intuition, it is an extremely useful way of navigating through and keeping 
track of a fairiy large number of substituents. 

25 The final facility provided by the molecular browser is to rescore a list of substituents using the empirical Bohm 

score. Rescoring in this way is practk:al because tens of structures can be scored per second and is useful because 
infomiatioh gained during the scoring can be used to provkie a graphk:al representation of the score. Hydrogen bonds 
or ionb interactions are located, marked and annotated with the contributbn they make to the predk;ted binding affinity. 
This saves a tot of time in deciding which hydrogen bonds are formed and how good they are. It also points out hydrogen 

30 bonds which may be contributing to the score in an unrealistk: way Bonds that are considered rotatable are also martced 
so that the user can see which bonds are (or are not) contributing to the score. Finally, the grki used to establish the 
lipophilic contributbn to the score is displayed graphically Relevant grid points fall into several categories: 

lipophilk; ligand atom in contact with lipophilic receptor atom 

35 

lipophilic ligand atom in contact with polar receptor atom (or vice versa) 

polar ligand atom in contact with polar receptor atom - lipophilic ligand atom in contact with nothing (i.e. solvent) 
^ ' polar ligand atom in contact with nothing (i.e. solvent) 
volume of ligand 

The user can cotour each of these grid point types, though in practice, we have tended to use colours for the first 
^ three types only. The visualisatk>n Is useful because it displays aspects of ligand-receptor contact which are often 
difficult to assess quickly from kx)king at the complex alone. 

After applk^atkyi of these tools a smaller set of substituents is decided on for each of the template attachment 
points. The aspects which are considered in producing this list are: 

2D diversity Using the clustering tools and chemical knowledge, a diverse set of substituents may be chiosen. For 
so example, if there are 10 fluorinated derivatives of phenylalanine only one need be chosen. Expbratbn of different 
chemistries is important because the scoring functions can only be expected to deliver approximate accuracy in the 
prediction of the binding affinity. 

3D contacts It is important to kx>k at the contacts a substituent is predated to make with the receptor and to form 
a judgement as to whether these seem reasonable or not. In particular, substituents whch have a large amount of 
55 polar-nonpolar contact are suspect. There shoukJ also be an awareness of 3D diversity and there should be an attempt 
to target molecules which explore different forms of receptor contact to make up for deficiencies in the scoring criteria. 

synthetic considerations There should be a conskieratk>n of synthetic feasibility. Although the strategy of asking 
single compounds by the rrvxX appropriate protocol means that a larger diversity of substituents are synthetically ac- 
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cessible, there will still be some substituents which contain functionalities that are difficult to incorporate in any synthetic 
protocol. Additionally, where one compound is to be chosen from several similar possibilities, choices could be made 
on the basis of ease of availability or price of the compounds. 

scores The scores of the substituents (e.g. Bohm scores, forcefield energies, etc.) can be used to choose preferred 
substituents from among lists of similar compounds. 

The process of combinatorial enumeration simply involves forming a list of all the remaining substituents at each 
R -group position and then creating all possible combinations of them. Thus, given a template with three R-group po- 
sitions and three substituents for each, the combinatorial enumeration procedure will produce 27 different molecules. 
The geometries produced are based on the highest scoring geometries of the corresponding substituents. The resulting 
molecules are stored for further analysis or transfer to a 3D database. 

The resulting molecules can be. but are not usually, minimised with the Clean forcefield and are then rescored in 
the same manner as the substituents. Estimated logP and molecular weight are also routinely calculated for the com- 
plete molecules. 

In our applications, the complete molecules have also been subjected to evaluation using the CFF95 forcefield in 
Discover (available from Molecular Simulations Inc. San Diego, California, US). Simplified cut down models of the 
receptor are used and minimisation and molecular dynamics are used to assess the quality of the designs. If the designs 
are reasonably stable during dynamics and possess high scoring snapshots then they are considered suitable for 
synthesis. 

The final decision about which molecules to synthesise is made by considering all the data collected for the sub- 
stituents and enumerated molecules. The full library could be synthesised, or selected molecules can be chosen from 
the full library. The possibility of experimental design to choose the best candidates has been explored. The method 
initially used was Doptimal design which attempts to maximise the coverage of a specified property space In a subset 
of nrralecules chosen from a larger library. In our explorations, the spread in the following properties was approximately 
maximised: 

the substituents from which each library member was derived 
estimated value of logP for each library member 

the hydrogen bond, the rotatable bond and lipophilic contributions to the Bohm score for each library meoiber 

Several constraints can be imposed on the design such as inclusion or exclusion of compounds which are outside 
a specified range of a molecular property. The general conclusion of this application of experimental design was that 
although it was useful, practical considerations, such as ease of synthesis of particular classes of connpounds from 
the full library, were usually more important 

Most of the computationally intensive routines of the operating software for the process of the invention nnay be 
written in Fortran, the data structure and data handling code in C, and the drivers and user interface parts in Global. 
Global is a proprietary interpreted language designed for application to computer-aided molecular design. The main 
use of Global is that, together with the chemical utilities and their associated data structure routines, it provides a 
flexible environment for the operation of the process of the invention. A language which allows high order chemical 
design features and user input to be expressed succinctly and naturally makes the methods easy to program, amend 
and debug. Global also makes mundane tasks such as lO and memory nnanagement straightfon^/ard, and frees the 
programmer.to.concentrate on the chemical design aspects of a programming task. Because there is rto compilation 
for the interpreted language it is easy to adapt the drivers and run them interactively or in batch ntKxJe. The user can 
either treat the GLOBAL files as input decks in the traditional sense or. if they have more confidence, can make fairly 
significant changes to the order of operation of the drivers, introducing different screens for the substituents as they 
see fit. Higher level languages have shown their worth before in CAMD applk^attons as illustrated by TRIPOS'S SPL 
language or the various languages offered to MSI users. 

• in the method of the inv6ntk>n, the drug compourKJ may if desired be formulated for administration, eg. via parenteral 
or enteral routes, for example orally, rectally. nasally, transdermally, by injection or infusbn. or into the lungs. Typical 
administration forms include tablets, powders, capsules, suppositories, syrups, sprays, solutions, dispersions, suspen- 
sions, emulsions and gels. Such compositions may contain conventional pharmaceutically acceptable carriers and 
excipients, eg. water for injectbns, physiological saline, buffers, sweeteners, dispersants, bulking agents, etc. 

EXAMPLE 

The generation of a Library of thrombin inhibitors is described as an example of the present invention. 
Thrombin is a trypsin-like serine protease recognised as a key enzyme within the coagulation cascade. Its primary 
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action is catalysis of the conversion of soluble fibrinogen to insoluble fibrin, which is the basis of thrombus formation 
and blood clotting. In addition, thrombin has several other roles in the control of pro- and anti-coagulant pathways in 
the coagulation cascade, inducing platelet aggregation, and more general signalling roles via activation of a thrombin 
receptor As thrombin is the final step in both the extrinsic and intrinsic clotting cascades, it has attracted much anention 
as a therapeutic target. Modulation of thrombin activity may be of use to prevent inappropriate thrombus formation, for 
example as a general anticoagulant as an adjunct to surgery or as a prophylaxis in various cardiovascular disorders 
such as myocardial infarction and unstable angina. Direct competitive inhibition of thrombin has been pursued by 
several pharmaceutical companies in an effort to obtain a new class of anticoagulants and antithrombotics, potentially 
with good oral bioavailability and improved efficacy and toxicity when compared with existing drugs. 

The present example relates to the design of a library of novel thrombin inhibitors which are potential drug candi- 
dates. At this stage the quality of the designs was assessed in ierms of an in vitro assay of thrombin inhibition. Suc- 
cessful designs may be selected on the basis of the measured inhibition constant (Kj). In addition, the selectivity of the 
compounds towards thrombin may be assessed by performing enzyme inhibitbn assays versus structurally-related 
serine proteases, such as trypsin and Factor Xa, In general, enzyme specificity is an important consideration because 
an intended thrombin inhibitor may also inhibit fibrinolytic enzymes and hence exert an undesired thrombotic effect. 

The first stage in the application of the process was the identification of an appropriate template structure and an 
associated synthetic strategy. This was achieved by analysis of known thrombin inhibitors in order to identify chemical 
moieties which appear to contribute favorably to binding. The source of the data was the Brookhaven protein database. 
From the available thrombin-ligand complexes it was decided to select as a template the proline moiety from the inhibitor 
PPACK (D-phenylalanyl-prolyl-arginyl-chloromethylketone). 

This was chosen for several reasons. Previous analysis of SAR data for thrombin had highlighted PPACK as an 
effective inhibitor bound at the active site (that is the site at which the catalytic hydrolysis of the peptide substrate 
occurs). The activity of PPACK was believed to be the result of making favorable interactions with several distinct 
regions of the active site, most importantly two hydrophobic pockets (labelled as distal and proximal to the catalytic 
amino-actd residues) and a polar pocket (the arginine-binding pocket or in enzyme terminology the SI pocket). Proline 
was considered a good choice for a template because it fulfilled the design criteria that it should make some favorable 
interactions in itself, and also allow the positioning of substituents whk;h will also make favorable interactions. Analysis 
of the X-ray structure revealed that proline makes good interacttons with the proximal hydrophobe pocket and allows 
the positioning of a potential library of substituents which are likely to make good interactions with the remaining two 
pockets. In the absence of an X-ray structure these assumptions would have to be made on the basis of modelling the 
enzyme-template complex. 

The structure of PPACK and some of its key interactions with thrombin are shown in Figure 1 . It was the intention 
to design a library of reversible inhibitors exploring a diverse set of substituents in the D-Phe and Arg positions. 

Several sets of substituent lists were prepared using different design criteria. Initially, the N-terminus on proline 
was targeted with starting reagents that possessed a carboxylk: acid (to form a peptide bond with the template), and 
a hydrogen bond donor plus a hydrophobic group (to form contacts with the D pocket). This was later augmented by 
a list of sulphonic acids and sulphonyl chlorides (to form a sulphonamide bond with the template). The C-terminus was 
initially targeted with starting reagents that possessed bis-amines (to form a peptide bond with the template and hy- 
drogen bonds in the SI pocket). This was augmented by a list of amines with aromatic nitro compounds used as 
•protected' anilines; and a list in which amines were 'protected* as nitrile compounds. In all cases, there were 2D and 
3D constraints imposed when searching through the ACD. Two positionings of the template in an associated receptor 
conformatbn were used. The first was derived directly from the proline position in the crystal structure of the covalently 
bound PPACK. and the second was derived from a computational simulation of a non<ovalentfy bound anabgue of 
PPACK- 

Tabie 1 gives some details of the numbers of compounds considered at each stage of the process for the second 
template positbn (the results from the first template positkxi are very similar). For the sake of clarity, the results of only 
one substituent list at each template attachment point are given. The 2D search was not fully refined and includes 
many reagents which are not practbable with any simple synthetb route. It also includes many substituents which 
woukJ be ruled out of a single protocol combinatorial approach yet have been successfully included in our final com- 
pound set. It is clear that even after a thorough application of 3D database searching, the virtual library size is still 
enormous and receptor screening and scoring are required to reduce it to a rmnageable number 
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Table 1 



statistics for number of molecules considered at each stage. The list of substituents for the Arg position were 


primary/secondary amines (plus hydrazines) for the 2D search, and bis-amines separated by 5-8A for the 3D search. 


For the Phe position the list was carboxylic acids for the 2D search, and hydrophobic carboxylic acids with a donor 


group 2-3A away for the 3D search. 






Stages 


No. of accepted substituents 


No. of compounds in 








Virtual Combinatorial 




Arg position 


Phe position 


Library 


After 2D ACD screen 


4262 


8803 


37518386 


After 3D ACD screen 


894 


437 


390678 


After receptor screen 


144 


145 


20880 


After binding affinity 


65 


81 


5265 


screen 








After strain energy screen 


53 


71 


3763 


Selected synthesis 


9 


8 


72 


candidates 









10 



IS 



20 



2S 



30 



3S 



40 



45 



SO 



ss 



The resulting substituent fists were then more thoroughly evaluated using: 2D chemical diversity; visualisation of 
the 3D contacts made by the substituents (those which interacted with different parts of the receptor were especially 
targeted); further computational evaluation of the predicted binding affinities, interaction energies and physical prop- 
erties of the substituents and their enumerated counterparts; and further consideration of synthetic feasibility 

Figures 2 and 3 give examples of the starting materials used in the synthesis of library members based around 
the proline template. Not all possible members of the library werp enumerated. All substituents at the Phe position 
were enumerated with the prolinyl-agmatine moiety (i.e. representing a good Arg position substituent) and all substit- 
uents at the Arg position were coupled to the D-phenylalaniny!-proline moiety (i.e. representing a good Phe position 
substituent). In addition a full array involving reagents A1 -A4 and B1 -B4 was synthesised. The basic synthetic protocols 
were modified as necessary to take account of the wide diversity of functionality in the starting materials. In particular, 
a solid phase approach was used with the bts-amlnes (82. B3, B5, B8 and B9) and solution methods were used for 
the others. The nitrile compounds were reduced in advance of coupling. All solution phase routes proceeded via cou- 
pling of the activated acids to the proline benzyl ester which allows for deprotection via hydrolysis or hydrogenation 
dependent upon substituents within the acid. 

The D-amino acid analogues of array A had free amino groups protected with Boc. Where B was a symmetrical 
bis-amine, the amine was attached to acid-labile chlorotrityl reshn and the resin washed with dichloromethane arKJ DMF. 
Fmoc proline was attached using TBTU/DIPEA activation (2 eq). After deprotection of the amino group with 20% pip- 
eridine in DMF, the Boc protected A component was coupled as before. The product was cleaved from the resin with 
10% TES in TFA (30 min), evaporated to dryness and triturated with diethyl ester to give the crude product. Where B 
was an asymmetric amine, the Boc protected A component was coupled to proline benzyl ester (1 eq) by activation 
with TBTU/TEA. Hydrolysis of the benzyl ester using NaOH (1 .1 eq) in acetone/water (1:1) yielded dipeptide acid which 
was pre-activated (TBTU/TEA) and reacted with the amine (1.1 eq) in DMF (or DMF/water (1:1) for water soluble B 
components). Extraction of the product folbwed by deprotection (5% aq TFA). evaporation and trituration with diethyl 
ether, yielded the crude product 

The other members of the A array (sulphontc acids, sulphonyl chlorides and a-hydroxy acid) were attached without 
further protectton. Sulphonyl chlorides were reacted with proline benzyl ester in the presence of TEA (2 eq). The ester 
product was hydrolysed as above and B1 coupled via TBTU/TEA pre-activation as above. The product was extracted 
with methanol after evaporation of the reaction mixture to dryness. 

The 'protected* B compounds were coupled as described above for asymmetric cimine B connpounds after appro- 
priate reduction: H2. Pd^C in the case of nitro groups, catalytic transfer hydrogenation (hydrazine hydrate, ethanol. Pd/ 
C, BS'^C) for B12. and LAH for the remaining nitrile compounds. 

The compounds were tested for inhibition of thrombin and trypsin using a colorimetric mtcroplate assay with syn- 
thetic peptide substrates as described by Tapparelll. J. Biol. Chem. 268:4734 (1993). In general the compounds were 
tested as crude products arid the more active compounds were purified, and accurate Kj were experimentally deter- 
mined. The results are given in Table 2, and show that alrrrost all of the compounds were active against the two enzymes 
with several compounds showing selectivity for thrombin. The most active compound (A3B1) has a Kj of 41 nanomolar. 
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Table 2: 





Inhibition results for molecules synthesised. The K,- values are in micromolar Where values ar in parentheses. 




only the crude compounds were tested. Generally, activity increased by between 3 and 10 times when pure samples 


5 


were used. Where no value is given, the molecules were not active. The errors in the calculation of the Kj for purified 




compounds are less than 10%. 












Comoound 


Thrombin 


1 rypsin 


oompounu 




1 lypsin 




A1B1 


0.56 


0.95 


A1B2 


(20) 


(0.6) 


TO 


A1B3 


(30) 


(10) 


A1B4 


(100) 


(-) 




A2B1 


0.12 


0.25 


A2B2 


0.83 


0.19 




A2B3 


2.8 


8.8 


A2B4 


(50) 


(-) 




Aod1 


U.04l 


0.60 


A3B2 


^^^^ 

0.30 


0.95 






1 .o 




noD4 


/Qn\ 


(") 




1 d 


1 9 


A4B2 


f221 


fO 61 










A4R4 




V / 








(40) 


A2B6 


(90) 


(-) 




A2B7 


0.71 










20 


A2B9 


0.69 


1.5 


A2B10 


4.0 


590 




A2B11 






A2B12 








A2B13 






A2B14 


48 . 






A2B15 












2$ 


A5B1 


2.9 


0.12 


A7B1 


0.28 


1.0 




A851 


1.6 


0.67 


A9B1 * 


0.53 


0.52 




A10B1 


(-) 


• (9) 


A12B1 


(-) 


(1) 




A13B1 


(200) 


(90) 









At the Phe position, the best scoring substituents were aromatic D-amino acids, which reflects the strict 3D con- 
straints imposed by the thrombin active site and the need to form good hydrophobic contacts If high affinity is to be 
achieved. The process of the invention did produce non-amino acid solutions but these scored poorly The best sub- 
stituent was the p-Br-D-Phe (A3) which is three times more active than the simple Phe derivative. The available starting 
material was a racemic mixture and the resulting diastereoisomers were separated by HPLC. As predicted, one of the 
diastereoisomers was at least 100 times less active than the other. Of particular interest are the substituents with polar 
functionality (A4, A5 and A7) which have not been thoroughly explored before in PPACK analogues. These were se- 
lected because they were predicted to form additional hydrogen bonds, which if not contributing to affinity, could en- 
hance selectivity. The poor activity of the sulphonamide derivatives against thrombin was not particularly surprising 
since the design criteria for this substituent list omitted a hydrogen bond to Gty-216. Despite this drawback, the syn- 
theses were justified because the sulphonamides increased the chemical diversity and allowed the exploration of dif- 
ferent modes for the hydrophobic pocket. 

At the Arg position, the most active base Is agmatlne, as expected, since the guantdino group can make excellent 
contacts with Asp-189 and Gty-218 at the bottom of the SI subsite. However there is great incentive to diverge from 
the arginine-like chemistry because of its pharmacokinetic properties and side-effect profile. Di-amino pentane (B9) is 
active, as would be expected for a lysine analogue (see Brady, Bio Med. Chem 8: 1 063 (1 995)). The other bis-amines 
also have respectable activity, which Is of Interest because good hydrophobu: contacts in this pocket may increase 
affinity and selectivity (see Deadman. J. I^ed Chem 38: 1511 (1 995)). The activity of the short aniline (B7) is particularly 
interesting. It is unlikely that this substituent Is long enough to Interact directly with Asp-189 (although there could be 
a mediating water nrx^lecule), instead It Is predated to form hydrogen bonds to Gly-2 1 9 and Ala-1 90. It was the activity 
of this compound which caused us to explore different anilines using the functional group transformation strategy 

Viewed from a second aspect the invention also provkles novel active connpounds identified by the process of this 
inventbn. Thus the compounds for which noni}arentheslzed activity values are given in Table 2 above are deemed to 
fall within the scope of the invention as are all other active PPACK anak^gs incorporating the 'successful' substituents 
that characterise reagents A3, A4. A5, A7 and B7. 
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Claims 

1. A process for drug candidate identification, said process comprising the steps of: 

(1 ) obtaining a computerised representation of the three-dimensional structure of a binding site on the surface 
of a biobgical macromolecule; 

(2) generating a computerised model of the functional structure of said binding site which may be used to 
identify favourable and unfavourable interactions between the binding site and a drug candidate molecule; 

(3) identifying a molecular fragment capable of placement within said binding site and capable of carrying at 
least one substituent group, said molecular fragment either being capable of being synthesized from reagent 
compounds accessible in substituted form whereby to import said substituent groups on synthesis of said 
nrtolecutar fragment or being present in an accessible reagent compound capable of substitution with said 
substituent groups by reaction with further accessible reagent compounds; 

(4) generating a set of lists of accessible reagent compounds, the lists being such that a combination of com- 
pounds taken from each list may be reacted to produce a candidate compound comprising said molecular 
fragment carrying a plurality of substituent groups thereby generating a first virtual library of candidate com- 
pounds being the theoretical set of compourids producible by reaction of the members of said fists, each 
member of each list comprising a component common to the other members of that list and a component 
unique within that list; 

(5) for each said list limiting the number of members thereof using a first set of exclusion rules thereby to 
generate a restricted second virtual library of candidate compounds, the operation of said first set of rules 
involving for each member of each list computerised comparison for favourable or unfavourable interactions 
between said computerised model and a structure comprising said molecular fragment and a substituent de- 
riving from the unique component within said list of that member, the molecular fragment and the computerised 
model being held in fixed spatial relationship to each other for said comparison; 

(6) evaluating and ranking by computer the members of sakJ second virtual library for favourable and unfa- 
vourable interactions with said computerised model and thereby generating a restricted third virtual library of 
candidate compounds ranked as having favourable interactions; 

(7) optionally, selecting from said third virtual library at least one further nrK>lecular fragment and repeating 
steps (4), (5) and (6) to generate an alternative third virtual library; 

(8) screening said third virtual library using a second set of exclusion rules thereby to generate a restricted 
fourth virtual library of candidate compounds comprising compounds which are candidates for synthesis and 
experimental evaluation for drug efficacy; 

(9) synthesizing some or all candidate compounds of said fourth virtual library to produce a candidate com- 
pound library;. 

(10) experimentally evaluating the compounds of said candidate compound library for drug efficacy; 

(11) analysing the experimental efficacy data generated in step ( 1 0) for structure-activity relationship inf omia- 
tbn; 

(1 2) using the infomrtation derived in step (11 ) selecting a revised set of lists of accessible reagent compounds, 
said lists being expanded to include selected reagents not present in the restrrcted lists generated in step (5) 
and optionally restricted to exclude selected reagents present In the restricted lists generated in step (5); 

(13) repeating steps (6) and (7) to identify further compounds which are candidates for synthesis and exper- 
imental evaluation for drug efficacy; 

(14) synthesising and experimentally evaluating said further compounds for drug efficacy; 
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(15) if required repeating steps (11) to (14) one or more times; 

(16) identifying as a lead candidate a compound synthesized and experimentally evaluated as above. 

s 2. A process according to claim 1 wherein in step (7) at least one further molecular fragment is selected from the 
third virtual library, whereafter steps (4), (5) and (6) are repeated to generate an alternative third virtual library 
which is subsequently screened in step (8). 

3. A process according to either of claims 1 and 2 wherein in step (1 2) said revised set of lists of accessible reagents 
10 is selected to include reagents excluded from the restricted lists generated in step (5) for being analogs of reagents 

included in said restricted lists. 

4. A process according to any one of claims 1 to 3 wherein in step ( 1 2) said revised set of lists of accessible reagents 
is selected to include reagents excluded from the restricted lists generated in step (5) for involving complex4rdns- 

'5 formation in their synthesis from commercially available reagents. 

5. A process accofding to any one of claims 1 to 4 wherein in step ( 1 2) said revised set of lists of accessible reagents 
is selected to include reagents excluded from the restricted lists generated in step (5) for being produced in low 
yield in their synthesis from commercially available reagents. 

20 

6. A process according to any one of claims 1 to 5 wherein in step (12) said revised set of lists of accessible reagents 
is selected to include reagents excluded from the restricted lists generated in step (5) for requiring significant 
purification following their synthesis from commercially available reagents. 

25 7, A'process according to any one of claims 1 to 6 wherein in step ( 1 2) said revised set of lists of accessible reagents 
is selected to include reagents excluded from the restricted lists generated in step (5) for being expensive. 

8. A process according to any one of claims 1 to 7 wherein in step (1) said representation is derived from X-ray 
crystallographic data for said macronrtoleculo. 

30 

9. A process according to any one of claims 1 to 8 wherein in step (4) said lists of accessible reagents are generated 
from a computer database of available chemicals. 

10. A process according to claim 9 wherein in step (4) said lists of accessible reagents are supplemented to include 
35 reagents accessible by transformation of reagents identified from scitd database. 

11. A process according to any one of claims 1 to 10 wherein the nrxxiel generated in step (2) comprises a represen- 
tation of those regions of the binding site capable of interaction with a molecule placed in said binding site the said 
regions being identified according to the nature and geometry of said interaction. 

40 

12. A process according to any one of claims 1 to 11 wherein in step (5) said computerised comparison for a reagent 
involves in sequence: (i) carrying out a subgraph isomorphism check to establish a match between said unique 
component of said reagent and said computerised n^el, (ii) rejecting reagents for which no match can be found, 
(iii) verifying the match for non-rejected reagents by torsional optimization of the rotatable bonds in the unique 

45 component, (iv) calculating the compatibility between the computerised model and a structure comprising the 
molecular fragment and the substituent deriving from the unique component of the reagent in the confirmation 
predicted by step (iii), (v) optionally repeating steps (iii) and (iv) to seek a conformation with enhanced compatibility 
(vi) reje(^ing reagents for which a preselected degree of compatibility is not found in steps (iv) and (v). (vii) deter- 
mining a score indicative of a minimum energy level for said structure within said computerised nrKxJel with the 

50 structure and position of said molecular fragment held constant, and (viii) ranking the reagents in a list according 
to the scores determined in step (vii). 

1 3. A process according to clsiim 1 2 wherein in step (vii) scores indicative of strain energy and contributions to energy 
level of individual interactions of components of said structure with said computerised model are also determined 

55 and reagents are rejected If such scores exceed pre-selected limits indicative of undesirable conformation or in- 
teraction. 

14. Novel activ compounds kientified by a process according to any one of claims 1 to 1 3. 



19 



EP 0 818 744 A2 



A method of manufacturing a drug substance, said method comprising the steps of: 

(1 ) obtaining a computerised representation of the three-dimensional structure of a binding site on the surface 
of a biological macromolecule; 

(2) generating a computerised model of the functional structure of said binding site which may be used to 
identify favourable and unfavourable interactions between the binding site and a drug candidate molecule; 

(3) identifying a molecular fragment capable of placement within said binding site and capable of carrying at 
least one substituent group, said molecular fragment either being capable of being synthesized from reagent 
compounds accessible in substituted form whereby to import said substituent groups on synthesis of said 
molecular fragment or being present in an accessible reagent compound capable of substitution with said 
substituent groups by reaction with further accessible reagent compounds; 

(4) generating a set of lists of accessible reagent compounds, the lists being such that a combination of com- 
pounds taken from each list may be reacted to produce a candidate compound comprising said nnolecular 
fragment carrying a plurality of substituent groups thereby generating a first virtual library of candidate com- 
pounds being the theoretical set of compounds producible by reaction of the members of said lists, each 
member of each list comprising a component common to the other members of that list and a component 
unique within that list; 

(5) for each said list limiting the number of members thereof using a first set of exclusion rules thereby to 
generate a restricted second virtual library of candidate compounds, the operation of said first set of rules 
involving for each member of each list computerised comparison for favourable or unfavourable interactions 
between said computerised model and a structure comprising said molecular fragment and a substituent de- 
riving from the unique component within said list of that member, the molecular fragment and the computerised 
model being held In fixed spatial relationship to each other for said comparison; 

(6) evaluating and ranking by computer the members of said second virtual library for favourable and unfa- 
vourable Interactions with said computerised model and thereby generating a restricted third virtual library of 
candidate compounds ranked as having favourable interactions; 

(7) optkxially, selecting from said third virtual library at least one further nK>lecular fragment and repeating 
steps (4), (5) and (6) to generate an altemative third virtual library; 

(8) screening said third virtual library using a second set of exclusion rules thereby to generate a restricted 
fourth virtual library of candidate compounds comprising compounds which are candidates for synthesis and 
experimental evaluation for drug efficacy; 

(9) synthesizing some or all candidate compounds of said fourth virtual library to produce a candidate com- 
pound library; 

(10) experimentally evaluating the compounds of said candidate compound library for drug efficacy; 

(11) analysing the experimental efficacy data generated in step (10) for stmcture-actlvity relationship informa- 
tion; 

(12) using the Information derived In step (11 ) selecting a revised set of lists of accessible reagent compounds, 
said lists being expanded to include selected reagents not present in the restricted lists generated In step (5) 
and optionally restricted to exclude selected reagents present in the restricted lists generated in step (5); 

(1 3) repeating steps (6) and (7) to Identify further compounds which are candidates for synthesis and exper- 
imental evaluation for drug effk:acy; 

(14) synthesising and experimentally evaluating said further compounds for drug efficacy; 

(15) if required repeating steps (11) to (14) one or more times; 



20 



EP 0 818 744 A2 

(16) identifying as a lead candidate a compound synthesized and experimentally evaluated as above; 

(17) manufacturing the compound identified in step (16) above; and. optionally. 

(18) admixing the compound manufactured in step (17) above with at least one pharmaceutically acceptable 
carrier or excipient. 
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